Part-of-speech Tagging of Code-Mixed Social Media Text

نویسندگان

  • Souvick Ghosh
  • Satanu Ghosh
  • Dipankar Das
چکیده

A common step in the processing of any text is the part-of-speech tagging of the input text. In this paper, we present an approach to tackle code-mixed text from three different languages Bengali, Hindi, and Tamil apart from English. Our system uses Conditional Random Field, a sequence learning method, which is useful to capture patterns of sequences containing code switching to tag each word with accurate part-of-speech information. We have used various pre-processing and post-processing modules to improve the performance of our system. The results were satisfactory, with a highest of 75.22% accuracy in Bengali-English mixed data. The methodology that we employed in the task can be used for any resource poor language. We adapted standard learning approaches that work well with scarce data. We have also ensured that the system is portable to different platforms and languages and can be deployed for real-time analysis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

POS Tagging of Hindi-English Code Mixed Text from Social Media: Some Machine Learning Experiments

We discuss Part-of-Speech(POS) tagging of Hindi-English Code-Mixed(CM) text from social media content. We propose extensions to the existing approaches, we also present a new feature set which addresses the transliteration problem inherent in social media. We achieve an 84% accuracy with the new feature set. We show that the context and joint modeling of language detection and POS tag layers do...

متن کامل

SMPOST: Parts of Speech Tagger for Code-Mixed Indic Social Media Text

Use of social media has grown dramatically fast during the past few years. Users usually follow informal languages in communicating through social media. This language of communication is often mixed in nature, where people transcribe their regional language with English. This technique of writing is increasing its popularity rapidly. Natural language processing (NLP) aims to infer the informat...

متن کامل

A CRF Based POS Tagger for Code-mixed Indian Social Media Text

In this work, we describe a conditional random fields (CRF) based system for Part-OfSpeech (POS) tagging of code-mixed Indian social media text as part of our participation in the tool contest on POS tagging for codemixed Indian social media text, held in conjunction with the 2016 International Conference on Natural Language Processing, IIT(BHU), India. We participated only in constrained mode ...

متن کامل

Recurrent Neural Network based Part-of-Speech Tagger for Code-Mixed Social Media Text

This paper describes Centre for Development of Advanced Computing’s (CDACM) submission to the shared task’Tool Contest on POS tagging for CodeMixed Indian Social Media (Facebook, Twitter, and Whatsapp) Text’, collocated with ICON-2016. The shared task was to predict Part of Speech (POS) tag at word level for a given text. The codemixed text is generated mostly on social media by multilingual us...

متن کامل

Part-of-Speech Tagging for Code-mixed Indian Social Media Text at ICON 2015

This paper discusses the experiments carried out by us at Jadavpur University as part of the participation in ICON 2015 task: POS Tagging for Code-mixed Indian Social Media Text. The tool that we have developed for the task is based on Trigram Hidden Markov Model that utilizes information from dictionary as well as some other word level features to enhance the observation probabilities of the k...

متن کامل

Part-of-Speech Tagging for Code-Mixed English-Hindi Twitter and Facebook Chat Messages

The paper reports work on collecting and annotating code-mixed English-Hindi social media text (Twitter and Facebook messages), and experiments on automatic tagging of these corpora, using both a coarse-grained and a fine-grained part-ofspeech tag set. We compare the performance of a combination of language specific taggers to that of applying four machine learning algorithms to the task (Condi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016